The Turkish Statistical Institute (TÜİK) is motivated to ensure the production, publication and distribution of necessary statistics while compiling Turkey’s education data and information. TÜİK collects data from individuals, households, workplaces through surveys and censuses. The education data it collects helps to compare education information and gives an idea of what education needs a country has for its development.
As part of the EMU430 course, we decided to examine the National Education Statistics dataset published by the Turkish Statistical Institute (TÜİK). This dataset provides comprehensive information about the state of education in Turkey. It is particularly suitable for analyzing issues such as reading rates, gender differences in education, and regional education levels.
Education plays a key role in the development of a society. Therefore, by choosing this dataset for our project, we aim to determine the strengths of Turkey’s education system and areas that need improvement. We plan to make important inferences about education with this data.
The main goal of our project is to conduct an in-depth analysis of the state and trends of education in Turkey. In this context, we will seek answers to the following questions:
Are there differences in education levels between genders? Has education equality been achieved regionally? How has the level of education in Turkey progressed over the years?
By answering these questions, we aim to understand the problems in the education system and develop suggestions that can contribute to education policies in Turkey.
This project aims to analyze educational data from 2008 to 2023. The following preprocessing steps prepare the dataset for analysis:
Data loading and removal of empty columns
Partitioning data by years
Column header adjustment
Creating multi-level column headers
🛠️ Step 0: Manual Data Preprocessing in Excel
Before starting data analysis, the dataset was manually cleaned in Excel to ensure consistency and accuracy:
Empty Columns Removed: Completely empty columns were deleted.
City Name Standardization: Inconsistent city names (e.g., Afyonkarahisar → Afyon, İçel → Mersin) were corrected.
Unnecessary Rows Cleared: Rows containing summary totals (e.g., labeled as Türkiye) and metadata were removed.
Header Adjustments: Column headers were standardized, and missing values were filled.
This cleaned dataset (yıllara_göre_egitim.xlsx) was then used for further processing in Python. 🚀
⚙️ Step 1: Data Loading and Cleaning
The dataset is loaded using Python’s pandas library, and completely empty columns are removed.
import pandas as pd# Load the dataset and remove completely empty columnsdf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')
Description:
- The dataset is loaded using the pandas library.
- Completely empty columns are removed from the dataset.
📅 Step 2: Partitioning Data by Years
The dataset is partitioned into separate data frames for each year.
# Define years and index rangesyears =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]
Description:
- The dataset is partitioned into yearly data frames.
- Index ranges are calculated for each year.
🔍 Step 3: Viewing Sample Years
Sample records from the first three years are examined.
# Inspect sample data frames from the first few yearssample_years = {year: data.head(1) for year, data inlist(yearly_data.items())[:3]}# Display sample data for verificationfor year, sample in sample_years.items():print(f"Sample data for year {year}:")print(sample)
Description:
- Data frames for the first three years are inspected.
- Header structure is validated.
- Sample data from the first row of each year’s data frame is displayed.
📝 Step 4: Adjusting Column Headers
The first two rows are used as column headers, and missing values are filled.
import pandas as pddf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')years =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# Use the first two rows as headers and fill missing valuesheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Create multi-level (MultiIndex) headersdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# Remove the first two rows and reset the datasetdf = df.iloc[2:].reset_index(drop=True)
Description:
- The first two rows are used as column headers.
- Missing values are filled (bfill).
- Multi-level headers are created.
📑 Step 5: Structuring Column Headers
Column headers are further structured.
Code Details
import pandas as pddf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')years =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# Use the first two rows as headers and fill missing valuesheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Create multi-level (MultiIndex) headersdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# Remove the first two rows and reset the datasetdf = df.iloc[2:].reset_index(drop=True)# Create single-level headers for the first three columnsfirst_three_headers = ['Year', 'Province Code', 'Province Name']# Define main and sub-headersmain_headers = ['Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen','İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek','Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen']sub_headers = ['Toplam', 'Erkek', 'Kadın']# Create multi-level headersremaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]# Combine all headersfinal_headers = first_three_headers + remaining_headers[:df.shape[1] -len(first_three_headers)]df.columns = pd.MultiIndex.from_tuples( [(header, '') ifisinstance(header, str) else header for header in final_headers])
Description:
- Single-level headers are assigned to the first three columns.
- Multi-level headers are created for remaining columns.
- Final column structure is established.
📊 Step 6: Data Structure Verification
# Display dataset informationprint(df.info())# Display the first few rows of the datasetprint(df.head())
Code Details
import pandas as pd# Veri setini yükle ve tamamen boş sütunları kaldırdf = pd.read_excel('/home/mami/emu430/emu430-fall2024-team-education_atlas/data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')# Yılları ve indeks aralıklarını tanımlayears =list(range(2008, 2024)) # 2008'den 2023'e kadarindex_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Her yıl için alt veri çerçevelerini oluşturyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# İlk birkaç yılın veri çerçevelerini kontrol etsample_years = {year: data.head(1) for year, data inlist(yearly_data.items())[:3]}# İlk iki satırı başlık olarak kullan ve eksik değerleri doldurheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Çok seviyeli (MultiIndex) başlıklar oluşturdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# İlk iki satırı kaldır ve veri setini sıfırladf = df.iloc[2:].reset_index(drop=True)# İlk 3 sütun için tek seviyeli başlıklar oluşturfirst_three_headers = ['Yıl', 'İl Kodu', 'İl Adı']# Ana başlıklar ve alt başlıkları tanımlamain_headers = ['Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen','İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek','Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen']sub_headers = ['Toplam', 'Erkek', 'Kadın']# Çok seviyeli başlıkları oluşturremaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]# Tüm başlıkları birleştirfinal_headers = first_three_headers + remaining_headers[:df.shape[1] -len(first_three_headers)]df.columns = pd.MultiIndex.from_tuples( [(header, '') ifisinstance(header, str) else header for header in final_headers])print(df.loc[df['Yıl'] ==2009])
Yıl İl Kodu İl Adı Genel toplam \
Toplam Erkek Kadın
81 2009 NaN Türkiye 65049093 32612307 32436786
82 2009 1 Adana 1838483 914799 923684
83 2009 2 Adıyaman 514226 256220 258006
84 2009 3 Afyonkarahisar 632466 312888 319578
85 2009 4 Ağrı 453257 236508 216749
.. ... ... ... ... ... ...
158 2009 77 Yalova 186120 93011 93109
159 2009 78 Karabük 202539 101087 101452
160 2009 79 Kilis 106205 52316 53889
161 2009 80 Osmaniye 416123 208100 208023
162 2009 81 Düzce 304102 151908 152194
Okuma yazma bilmeyen \
Toplam Erkek Kadın
81 4672257 915054 3757203
82 148634 29472 119162
83 64877 14374 50503
84 46634 7243 39391
85 65454 15413 50041
.. ... ... ...
158 6853 1197 5656
159 16355 3022 13333
160 11429 2476 8953
161 36910 7244 29666
162 18403 3211 15192
Okuma yazma bilen fakat bir okul bitirmeyen ... Yüksekokul veya fakülte \
Toplam ... Kadın
81 13517214 ... 1786379
82 398108 ... 48596
83 135198 ... 5811
84 124173 ... 11334
85 188673 ... 2408
.. ... ... ...
158 30755 ... 5630
159 34413 ... 5071
160 27230 ... 1478
161 93989 ... 7797
162 59972 ... 5868
Yüksek lisans Doktora Bilinmeyen \
Toplam Erkek Kadın Toplam Erkek Kadın Toplam Erkek
81 279268 166285 112983 95500 61301 34199 3032457 1657575
82 5570 3173 2397 2044 1312 732 80015 45750
83 567 406 161 151 120 31 22234 12751
84 1610 1018 592 588 447 141 15534 8827
85 459 344 115 151 101 50 48889 26122
.. ... ... ... ... ... ... ... ...
158 673 445 228 129 88 41 13696 7229
159 456 309 147 172 133 39 5506 3070
160 126 86 40 65 55 10 4504 2522
161 591 396 195 106 68 38 17466 9477
162 668 420 248 265 191 74 10764 5716
Kadın
81 1374882
82 34265
83 9483
84 6707
85 22767
.. ...
158 6467
159 2436
160 1982
161 7989
162 5048
[82 rows x 36 columns]
🚀 R’de .RData formatında kaydetmek için aşağıdaki komutları kullanabilirsiniz:
import pandas as pd
# Veri setini yükle ve tamamen boş sütunları kaldır
df = pd.read_excel('/home/mami/emu430/emu430-fall2024-team-education_atlas/data/yıllara_göre_egitim.xlsx')
df = df.dropna(axis=1, how='all')
# Yılları ve indeks aralıklarını tanımla
years = list(range(2008, 2024)) # 2008'den 2023'e kadar
index_ranges = [(1 + i * 82, 1 + (i + 1) * 82) for i in range(len(years))]
# Her yıl için alt veri çerçevelerini oluştur
yearly_data = {}
for year, (start, end) in zip(years, index_ranges):
yearly_data[year] = df.iloc[start:end]
# İlk birkaç yılın veri çerçevelerini kontrol et
sample_years = {year: data.head(1) for year, data in list(yearly_data.items())[:3]}
# İlk iki satırı başlık olarak kullan ve eksik değerleri doldur
header1 = df.iloc[0].bfill()
header2 = df.iloc[1].bfill()
# Çok seviyeli (MultiIndex) başlıklar oluştur
df.columns = pd.MultiIndex.from_tuples(zip(header1, header2))
# İlk iki satırı kaldır ve veri setini sıfırla
df = df.iloc[2:].reset_index(drop=True)
# İlk 3 sütun için tek seviyeli başlıklar oluştur
first_three_headers = ['Yıl', 'İl Kodu', 'İl Adı']
# Ana başlıklar ve alt başlıkları tanımla
main_headers = [
'Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen',
'İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek',
'Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen'
]
sub_headers = ['Toplam', 'Erkek', 'Kadın']
# Çok seviyeli başlıkları oluştur
remaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]
# Tüm başlıkları birleştir
final_headers = first_three_headers + remaining_headers[:df.shape[1] - len(first_three_headers)]
df.columns = pd.MultiIndex.from_tuples(
[(header, '') if isinstance(header, str) else header for header in final_headers]
)
# Veri setini bir dosyaya kaydet
import pyreadr
# Veri çerçevesini Rdata olarak kaydet
pyreadr.write_rds('data/education_data.RData', df)
Code Details
# Veri setini bir dosyaya kaydetimport pyreadr# Veri çerçevesini Rdata olarak kaydetpyreadr.write_rds('education_data.RData', df)
R’de kullanılmak üzere açıklama
✅ Veri seti başarıyla kaydedildi: “data/education_data.RData” Rdata
---title: "Data"---# 1. General Information About DataThe Turkish Statistical Institute (TÜİK) is motivated to ensure the production, publication and distribution of necessary statistics while compiling Turkey's education data and information. TÜİK collects data from individuals, households, workplaces through surveys and censuses. The education data it collects helps to compare education information and gives an idea of what education needs a country has for its development.**Data Source:** [TÜİK](https://data.tuik.gov.tr/Bulten/Index?p=Ulusal-Egitim-Istatistikleri-2023-53444)# 2. Why We Choose Data and Our PurposesAs part of the EMU430 course, we decided to examine the National Education Statistics dataset published by the Turkish Statistical Institute (TÜİK). This dataset provides comprehensive information about the state of education in Turkey. It is particularly suitable for analyzing issues such as reading rates, gender differences in education, and regional education levels.Education plays a key role in the development of a society. Therefore, by choosing this dataset for our project, we aim to determine the strengths of Turkey's education system and areas that need improvement. We plan to make important inferences about education with this data.The main goal of our project is to conduct an in-depth analysis of the state and trends of education in Turkey. In this context, we will seek answers to the following questions:Are there differences in education levels between genders? Has education equality been achieved regionally? How has the level of education in Turkey progressed over the years?By answering these questions, we aim to understand the problems in the education system and develop suggestions that can contribute to education policies in Turkey.# 3. Data Preprocessing and EDA Analysis---title: "Data Preprocessing Report"author: "Education Atlas "date: todayformat: html: toc: true toc-depth: 2 code-fold: true code-summary: "Code Details"execute: echo: true warning: false error: true---# 📊 IntroductionThis project aims to analyze **educational data from 2008 to 2023**. The following preprocessing steps prepare the dataset for analysis:1. Data loading and removal of empty columns 2. Partitioning data by years 3. Column header adjustment 4. Creating multi-level column headers ---# 🛠️ Step 0: Manual Data Preprocessing in ExcelBefore starting data analysis, the dataset was manually cleaned in Excel to ensure consistency and accuracy:1. Empty Columns Removed: Completely empty columns were deleted.2. City Name Standardization: Inconsistent city names (e.g., Afyonkarahisar → Afyon, İçel → Mersin) were corrected.3. Unnecessary Rows Cleared: Rows containing summary totals (e.g., labeled as Türkiye) and metadata were removed.4. Header Adjustments: Column headers were standardized, and missing values were filled.This cleaned dataset (yıllara_göre_egitim.xlsx) was then used for further processing in Python. 🚀# ⚙️ Step 1: Data Loading and CleaningThe dataset is loaded using Python's `pandas` library, and completely empty columns are removed.```pythonimport pandas as pd# Load the dataset and remove completely empty columnsdf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')```**Description:** - The dataset is loaded using the `pandas` library. - Completely empty columns are removed from the dataset.---# 📅 Step 2: Partitioning Data by YearsThe dataset is partitioned into separate data frames for each year.```python# Define years and index rangesyears =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]```**Description:** - The dataset is partitioned into yearly data frames. - Index ranges are calculated for each year.---# 🔍 Step 3: Viewing Sample YearsSample records from the first three years are examined.```python# Inspect sample data frames from the first few yearssample_years = {year: data.head(1) for year, data inlist(yearly_data.items())[:3]}# Display sample data for verificationfor year, sample in sample_years.items():print(f"Sample data for year {year}:")print(sample)```**Description:** - Data frames for the first three years are inspected. - Header structure is validated. - Sample data from the first row of each year's data frame is displayed.---# 📝 Step 4: Adjusting Column HeadersThe first two rows are used as column headers, and missing values are filled.```pythonimport pandas as pddf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')years =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# Use the first two rows as headers and fill missing valuesheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Create multi-level (MultiIndex) headersdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# Remove the first two rows and reset the datasetdf = df.iloc[2:].reset_index(drop=True)```**Description:** - The first two rows are used as column headers. - Missing values are filled (`bfill`). - Multi-level headers are created.---# 📑 Step 5: Structuring Column HeadersColumn headers are further structured.```{python}import pandas as pddf = pd.read_excel('data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')years =list(range(2008, 2024)) # From 2008 to 2023index_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Create sub-data frames for each yearyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# Use the first two rows as headers and fill missing valuesheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Create multi-level (MultiIndex) headersdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# Remove the first two rows and reset the datasetdf = df.iloc[2:].reset_index(drop=True)# Create single-level headers for the first three columnsfirst_three_headers = ['Year', 'Province Code', 'Province Name']# Define main and sub-headersmain_headers = ['Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen','İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek','Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen']sub_headers = ['Toplam', 'Erkek', 'Kadın']# Create multi-level headersremaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]# Combine all headersfinal_headers = first_three_headers + remaining_headers[:df.shape[1] -len(first_three_headers)]df.columns = pd.MultiIndex.from_tuples( [(header, '') ifisinstance(header, str) else header for header in final_headers])```**Description:** - Single-level headers are assigned to the first three columns. - Multi-level headers are created for remaining columns. - Final column structure is established.---# 📊 Step 6: Data Structure Verification```python# Display dataset informationprint(df.info())# Display the first few rows of the datasetprint(df.head())``````{python}import pandas as pd# Veri setini yükle ve tamamen boş sütunları kaldırdf = pd.read_excel('/home/mami/emu430/emu430-fall2024-team-education_atlas/data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')# Yılları ve indeks aralıklarını tanımlayears =list(range(2008, 2024)) # 2008'den 2023'e kadarindex_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Her yıl için alt veri çerçevelerini oluşturyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# İlk birkaç yılın veri çerçevelerini kontrol etsample_years = {year: data.head(1) for year, data inlist(yearly_data.items())[:3]}# İlk iki satırı başlık olarak kullan ve eksik değerleri doldurheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Çok seviyeli (MultiIndex) başlıklar oluşturdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# İlk iki satırı kaldır ve veri setini sıfırladf = df.iloc[2:].reset_index(drop=True)# İlk 3 sütun için tek seviyeli başlıklar oluşturfirst_three_headers = ['Yıl', 'İl Kodu', 'İl Adı']# Ana başlıklar ve alt başlıkları tanımlamain_headers = ['Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen','İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek','Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen']sub_headers = ['Toplam', 'Erkek', 'Kadın']# Çok seviyeli başlıkları oluşturremaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]# Tüm başlıkları birleştirfinal_headers = first_three_headers + remaining_headers[:df.shape[1] -len(first_three_headers)]df.columns = pd.MultiIndex.from_tuples( [(header, '') ifisinstance(header, str) else header for header in final_headers])print(df.loc[df['Yıl'] ==2009])```# 🚀 R'de .RData formatında kaydetmek için aşağıdaki komutları kullanabilirsiniz:```{Python}import pandas as pd# Veri setini yükle ve tamamen boş sütunları kaldırdf = pd.read_excel('/home/mami/emu430/emu430-fall2024-team-education_atlas/data/yıllara_göre_egitim.xlsx')df = df.dropna(axis=1, how='all')# Yılları ve indeks aralıklarını tanımlayears =list(range(2008, 2024)) # 2008'den 2023'e kadarindex_ranges = [(1+ i *82, 1+ (i +1) *82) for i inrange(len(years))]# Her yıl için alt veri çerçevelerini oluşturyearly_data = {}for year, (start, end) inzip(years, index_ranges): yearly_data[year] = df.iloc[start:end]# İlk birkaç yılın veri çerçevelerini kontrol etsample_years = {year: data.head(1) for year, data inlist(yearly_data.items())[:3]}# İlk iki satırı başlık olarak kullan ve eksik değerleri doldurheader1 = df.iloc[0].bfill()header2 = df.iloc[1].bfill()# Çok seviyeli (MultiIndex) başlıklar oluşturdf.columns = pd.MultiIndex.from_tuples(zip(header1, header2))# İlk iki satırı kaldır ve veri setini sıfırladf = df.iloc[2:].reset_index(drop=True)# İlk 3 sütun için tek seviyeli başlıklar oluşturfirst_three_headers = ['Yıl', 'İl Kodu', 'İl Adı']# Ana başlıklar ve alt başlıkları tanımlamain_headers = ['Genel toplam', 'Okuma yazma bilmeyen', 'Okuma yazma bilen fakat bir okul bitirmeyen','İlkokul', 'İlköğretim', 'Ortaokul ve dengi meslek', 'Lise ve dengi meslek','Yüksekokul veya fakülte', 'Yüksek lisans', 'Doktora', 'Bilinmeyen']sub_headers = ['Toplam', 'Erkek', 'Kadın']# Çok seviyeli başlıkları oluşturremaining_headers = [(main, sub) for main in main_headers for sub in sub_headers]# Tüm başlıkları birleştirfinal_headers = first_three_headers + remaining_headers[:df.shape[1] -len(first_three_headers)]df.columns = pd.MultiIndex.from_tuples( [(header, '') ifisinstance(header, str) else header for header in final_headers])# Veri setini bir dosyaya kaydetimport pyreadr# Veri çerçevesini Rdata olarak kaydetpyreadr.write_rds('data/education_data.RData', df)``````{python}# Veri setini bir dosyaya kaydetimport pyreadr# Veri çerçevesini Rdata olarak kaydetpyreadr.write_rds('education_data.RData', df)```# R'de kullanılmak üzere açıklama✅ Veri seti başarıyla kaydedildi: "data/education_data.RData" [Rdata](education_data.RData)### Veri Seti BağlantısıVeri setine [buradan ulaşabilirsiniz](education_data.RData).**Description:** - The dataset's column structure is inspected. - The first few rows are displayed to validate adjustments.---# ✅ Summary and ResultsAfter the preprocessing steps:1. The dataset was successfully loaded. 2. Yearly data partitioning was completed. 3. Multi-level column headers were structured. 4. Sample data from selected years was displayed for verification. ---# 🚀 Next Steps- Missing value analysis and filling strategies - Data normalization and scaling - Visualization and analysis steps **This report provides a detailed overview of the data preprocessing steps and prepares the dataset for further analysis.** 🚀